Releases/gcc 12 #65

jacopobrusini · 2022-06-04T17:20:00Z

Support for Apple Silicon!!!

jwakely · 2024-02-21T00:10:33Z

This is an unofficial mirror that has nothing to do with the GCC project, so submitting pull requests here is a waste of time.

Also, I have no idea what this pull request is trying to do but it would never be accepted even if it was submitted to the right place.

There are several typo in AVX512 intrins macro define. Correct them to solve errors when compiled with -O0. gcc/ChangeLog: * config/i386/avx512dqintrin.h (_mm_mask_fpclass_ss_mask): Correct operand order. (_mm_mask_fpclass_sd_mask): Ditto. (_mm256_maskz_reduce_round_ss): Use __builtin_ia32_reducess_mask_round instead of __builtin_ia32_reducesd_mask_round. (_mm_reduce_round_sd): Use -1 as mask since it is non-mask. (_mm_reduce_round_ss): Ditto. * config/i386/avx512vlbwintrin.h (_mm256_mask_alignr_epi8): Correct operand usage. (_mm_mask_alignr_epi8): Ditto. * config/i386/avx512vlintrin.h (_mm_mask_alignr_epi64): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bw-vpalignr-1b.c: New test. * gcc.target/i386/avx512dq-vfpclasssd-1b.c: Ditto. * gcc.target/i386/avx512dq-vfpclassss-1b.c: Ditto. * gcc.target/i386/avx512dq-vreducesd-1b.c: Ditto. * gcc.target/i386/avx512dq-vreducess-1b.c: Ditto. * gcc.target/i386/avx512vl-valignq-1b.c: Ditto.

…13/12 In GCC13/12, there is no _mm_avx512_setzero_ps/d since it is introduced in GCC14. gcc/ChangeLog: * config/i386/avx512dqintrin.h (_mm_reduce_round_sd): Use _mm_setzero_pd instead of _mm_avx512_setzero_pd. (_mm_reduce_round_ss): Use _mm_setzero_ps instead of _mm_avx512_setzero_ps.

2024-07-18 Paul Thomas <[email protected]> gcc/fortran PR fortran/108889 * gfortran.h: Add bit field 'allocated_in_scope' to gfc_symbol. * trans-array.cc (gfc_array_allocate): Set 'allocated_in_scope' after allocation if not a component reference. (gfc_alloc_allocatable_for_assignment): If 'allocated_in_scope' not set, not a component ref and not allocated, set the array bounds and offset to give zero length in all dimensions. Then set allocated_in_scope. gcc/testsuite/ PR fortran/108889 * gfortran.dg/pr108889.f90: New test. (cherry picked from commit c3aa339)

2024-07-19 Paul Thomas <[email protected]> libgomp/ChangeLog * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Cut dg-note about 'a' and remove bogus warnings about its array descriptor components being used uninitialized. (cherry picked from commit 8d6994f)

This was an interesting compare debug failure to debug. The first symptom was in gcse which would produce different order of creating psedu-registers. This was caused by a different order of a hashtable walk, due to the hash table having different number of entries. Which in turn was due to the number of max insn being different between the 2 runs. The place max insn uid comes from was in sh_recog_treg_set_expr which is called via rtx_costs and fwprop would cause rtx_costs in some cases for debug insn related stuff. Build and tested for sh4-linux-gnu. PR target/116189 gcc/ChangeLog: * config/sh/sh.cc (sh_recog_treg_set_expr): Don't call make_insn_raw, make the insn with a fake uid. gcc/testsuite/ChangeLog: * c-c++-common/torture/pr116189-1.c: New test. Signed-off-by: Andrew Pinski <[email protected]> (cherry picked from commit 0355c94)

…ization The constant C must be an integral multiple of the shift value in the above optimization. Non integral values can occur evaluating IMAGPART_EXPR when the shadd constant is 8 and we have SFmode. 2024-08-06 John David Anglin <[email protected]> gcc/ChangeLog: PR target/113384 * config/pa/pa.cc (hppa_legitimize_address): Add check to ensure constant is an integral multiple of shift the value.

For below pattern, RA may still allocate r162 as v/k register, try to reload for address with leaq __libc_tsd_CTYPE_B@gottpoff(%rip), %rsi which result a linker error. (set (reg:DI 162) (mem/u/c:DI (const:DI (unspec:DI [(symbol_ref:DI ("a") [flags 0x60] <var_decl 0x7f621f6e1c60 a>)] UNSPEC_GOTNTPOFF)) Quote from H.J for why linker issue an error. >What do these do: > > leaq __libc_tsd_CTYPE_B@gottpoff(%rip), %rax > vmovq (%rax), %xmm0 > >From x86-64 TLS psABI: > >The assembler generates for the x@gottpoff(%rip) expressions a R X86 >64 GOTTPOFF relocation for the symbol x which requests the linker to >generate a GOT entry with a R X86 64 TPOFF64 relocation. The offset of >the GOT entry relative to the end of the instruction is then used in >the instruction. The R X86 64 TPOFF64 relocation is pro- cessed at >program startup time by the dynamic linker by looking up the symbol x >in the modules loaded at that point. The offset is written in the GOT >entry and later loaded by the addq instruction. > >The above code sequence looks wrong to me. gcc/ChangeLog: PR target/116043 * config/i386/constraints.md (Bk): Refine to define_special_memory_constraint. gcc/testsuite/ChangeLog: * gcc.target/i386/pr116043.c: New test. (cherry picked from commit bc1fda0)

Not sure how this happend, but: svsudot is supposed to be expanded as USDOT with the operands swapped. However, a thinko in the expansion of svsudot meant that the arguments weren't in fact swapped; the attempted swap was just a no-op. And the testcases blithely accepted that. gcc/ PR target/114607 * config/aarch64/aarch64-sve-builtins-base.cc (svusdot_impl::expand): Fix botched attempt to swap the operands for svsudot. gcc/testsuite/ PR target/114607 * gcc.target/aarch64/sve/acle/asm/sudot_s32.c: New test. (cherry picked from commit 2c1c248)

Initializing a vector using Vec : V.Vector := [Some_Type'(Some_Abstract_Type with F => 0)]; may crash the compiler. The expander marks the N_Extension_Aggregate for delayed expansion which never happens and incorrectly ends up in gigi. The delayed expansion is needed for nested aggregates, which the original code is testing for, but container aggregates are handled differently. Such assignments to container aggregates are later transformed into procedure calls to the procedures named in the Aggregate aspect definition, for which the delayed expansion is not required/expected. gcc/ada/ PR ada/118234 * exp_aggr.adb (Convert_To_Assignments): Do not mark node for delayed expansion if parent type has the Aggregate aspect. * sem_util.adb (Is_Container_Aggregate): Move... * sem_util.ads (Is_Container_Aggregate): ... here and make it public.

This just applies the same fix to Expand_Array_Aggregate as the one that was recently applied to Convert_To_Assignments. gcc/ada/ PR ada/118234 * exp_aggr.adb (Convert_To_Assignments): Tweak comment. (Expand_Array_Aggregate): Do not delay the expansion if the parent node is a container aggregate.

This handles the case where a component association is present. gcc/ada/ PR ada/118234 * exp_aggr.adb (Convert_To_Assignments): In the case of a component association, call Is_Container_Aggregate on the parent's parent. (Expand_Array_Aggregate): Likewise.

gcc/ada * libgnarl/s-taprop__dummy.adb: Remove use clause for System.Parameters. (Unlock): Remove Global_Lock formal parameter. (Write_Lock): Likewise.

this patch adds support for new fussion in znver5 documented in the optimization manual: The Zen5 microarchitecture adds support to fuse reg-reg MOV Instructions with certain ALU instructions. The following conditions need to be met for fusion to happen: - The MOV should be reg-reg mov with Opcode 0x89 or 0x8B - The MOV is followed by an ALU instruction where the MOV and ALU destination register match. - The ALU instruction may source only registers or immediate data. There cannot be any memory source. - The ALU instruction sources either the source or dest of MOV instruction. - If ALU instruction has 2 reg sources, they should be different. - The following ALU instructions can fuse with an older qualified MOV instruction: ADD ADC AND XOR OP SUB SBB INC DEC NOT SAL / SHL SHR SAR (I assume OP is OR) I also increased issue rate from 4 to 6. Theoretically znver5 can do more, but with our model we can't realy use it. Increasing issue rate to 8 leads to infinite loop in scheduler. Finally, I also enabled fuse_alu_and_branch since it is supported by znver5 (I think by earlier zens too). New fussion pattern moves quite few instructions around in common code: @@ -2210,13 +2210,13 @@ .cfi_offset 3, -32 leaq 63(%rsi), %rbx movq %rbx, %rbp + shrq $6, %rbp + salq $3, %rbp subq $16, %rsp .cfi_def_cfa_offset 48 movq %rdi, %r12 - shrq $6, %rbp - movq %rsi, 8(%rsp) - salq $3, %rbp movq %rbp, %rdi + movq %rsi, 8(%rsp) call _Znwm movq 8(%rsp), %rsi movl $0, 8(%r12) @@ -2224,8 +2224,8 @@ movq %rax, (%r12) movq %rbp, 32(%r12) testq %rsi, %rsi - movq %rsi, %rdx cmovns %rsi, %rbx + movq %rsi, %rdx sarq $63, %rdx shrq $58, %rdx sarq $6, %rbx which should help decoder bandwidth and perhaps also cache, though I was not able to measure off-noise effect on SPEC. gcc/ChangeLog: * config/i386/i386.h (TARGET_FUSE_MOV_AND_ALU): New tune. * config/i386/x86-tune-sched.cc (ix86_issue_rate): Updat for znver5. (ix86_adjust_cost): Add TODO about znver5 memory latency. (ix86_fuse_mov_alu_p): New. (ix86_macro_fusion_pair_p): Use it. * config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): Add ZNVER5. (X86_TUNE_FUSE_MOV_AND_ALU): New tune; (cherry picked from commit e2125a6)

Zen5 has 6 instead of 4 ALUs and the integer multiplication can now execute in 3 of them. FP units can do 2 additions and 2 multiplications with latency 2 and 3. This patch updates reassociation width accordingly. This has potential of increasing register pressure but unlike while benchmarking znver1 tuning I did not noticed this actually causing problem on spec, so this patch bumps up reassociation width to 6 for everything except for integer vectors, where there are 4 units with typical latency of 1. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * config/i386/i386.cc (ix86_reassociation_width): Update for Znver5. * config/i386/x86-tune-costs.h (znver5_costs): Update reassociation widths. (cherry picked from commit f0ab3de)

gcc/ChangeLog: * doc/cpp.texi (Common Predefined Macros): Fix syntax.

The following makes analysis and transform agree on constraints. PR tree-optimization/115646 * tree-call-cdce.cc (check_pow): Check for bit_sz values as allowed by transform. * gcc.dg/pr115646.c: New testcase. (cherry picked from commit 453b1d2)

The following avoids associating a reduction path as that might get STMT_VINFO_REDUC_IDX out-of-sync with the SLP operand order. This is a latent issue with SLP reductions but now easily exposed as we're doing single-lane SLP reductions. When we achieved SLP only we can move and update this meta-data. PR tree-optimization/115669 * tree-vect-slp.cc (vect_build_slp_tree_2): Do not reassociate chains that participate in a reduction. * gcc.dg/vect/pr115669.c: New testcase. (cherry picked from commit 7886830)

The following fixes an issue with CCPs likely_value when faced with a vector CTOR containing undef SSA names and constants. This should be classified as CONSTANT and not UNDEFINED. PR tree-optimization/116057 * tree-ssa-ccp.cc (likely_value): Also walk CTORs in stmt operands to look for constants. * gcc.dg/torture/pr116057.c: New testcase. (cherry picked from commit 1ea5515)

atahanozbayram approved these changes Apr 2, 2024

View reviewed changes

GCC Administrator and others added 28 commits July 28, 2024 00:20

Daily bump.

b110b66

Daily bump.

fc5b4e9

Daily bump.

16ea079

Daily bump.

3e6b076

Daily bump.

b0137fe

Daily bump.

c7d2a41

Daily bump.

77c8a37

Daily bump.

2c6bfcd

Daily bump.

13e6f13

Daily bump.

0e2a2b9

Daily bump.

df772cc

Daily bump.

2ad59ea

Daily bump.

07d7487

Daily bump.

3a3c11c

Daily bump.

4b81821

Daily bump.

0a59397

Daily bump.

1d64d01

Daily bump.

399acac

Daily bump.

0906e9b

Daily bump.

62b4f08

GCC Administrator and others added 30 commits December 22, 2024 00:21

Daily bump.

a03684e

Daily bump.

7456ecb

Daily bump.

1eff0e2

Daily bump.

2a46fb4

Daily bump.

421e27e

Daily bump.

bf7987c

Daily bump.

8ddbec5

Daily bump.

afeeda0

Daily bump.

8c9f8cd

Daily bump.

0268510

Daily bump.

36a977e

Daily bump.

df5e5d0

Daily bump.

2eabd17

Daily bump.

0b79699

Daily bump.

91e5cb0

Ada: Fix build for dummy s-taprop

85319f4

gcc/ada * libgnarl/s-taprop__dummy.adb: Remove use clause for System.Parameters. (Unlock): Remove Global_Lock formal parameter. (Write_Lock): Likewise.

Daily bump.

465e7ac

Daily bump.

5a49cbe

Daily bump.

3cbf717

Daily bump.

f3f351a

Daily bump.

f075683

doc: cpp: fix version test example syntax

36db43d

gcc/ChangeLog: * doc/cpp.texi (Common Predefined Macros): Fix syntax.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases/gcc 12 #65

Releases/gcc 12 #65

jacopobrusini commented Jun 4, 2022

jwakely commented Feb 21, 2024

Releases/gcc 12 #65

Are you sure you want to change the base?

Releases/gcc 12 #65

Conversation

jacopobrusini commented Jun 4, 2022

jwakely commented Feb 21, 2024